Book Reviews 


Challenges in Indexing Electronic Text and Images. Edited by 
Raya Fidel, Trudi Bellardo Hahn, Edie M. Rasmussen, and 
Philip J. Smith. Medford, NJ: Published for the American So- 
ciety for Information Science by Learned Information; 1994: 
ix + 306 pp. (ASIS Monograph Series). Price: $39.50. (ISBN 0- 
938734-76-8.) 


It is ironic that at a time when there is exponential growth in 
the number of documents available in machine-readable form, 
with concomitant predictions that the need for human content 
analysis will disappear owing to the availability of natural-lan- 
guage searching on full text, the number of publications on hu- 
man indexing has increased dramatically. 

The work under review contains 15 papers in four sections; 
each of the sections was edited by one of the ASIS members 
named on the title page. The papers emanate from ASIS annual 
meetings, although the preface to the book does not state this. 
In this review, I attempt (a) to distill the key points in each 
paper that represent an innovation with respect to traditional 
indexing, and (b) to evaluate the design and structure of the 
book as an “information package.” 

The first section, “Indexing and Accessing Images,” is edited 
by Trudi Bellardo Hahn. Her introduction notes that visual 
documents preceded textual ones in human history, but sys- 
tems to index the former are much more recent than those de- 
signed for the latter. 

In “User Types and Queries: Impact on Image Access Sys- 
tems,”’ Lucinda Keister reports on a study of user query data 
performed by the Prints and Photographs Collection of the Na- 
tional Library of Medicine in order to develop its automated 
retrieval system and build a thesaurus. Differences in the que- 
ries of picture researchers, health professionals, and the aca- 
demic community were identified through analysis of the 
staffers’ log. 

The article is well written and generously illustrated. Given 
the focus of this book on indexing, however, the lack of discus- 
sion of how query terms are translated into controlled vocabu- 
lary leaves the reader feeling that the paper is incomplete. Also, 
it would have been desirable to include sample indexing re- 
cords for the visual materials in addition to, or instead of, the 
medical illustrations. 

In “Thinking Ambiguously: Organizing Source Materials 
for Historical Research,” Joseph Busch explains the differences 
between historical and factual questions, noting that answers to 
the latter are unique and unambiguous, while multiple answers 
may exist for the former. There is, for example, a single correct 
answer at present to the query “What is the population of Los 
Angeles?” but many answers as one goes back in time. Busch 
surveys the information sources and needs of general historians 
and of art historians, noting the desire of both for ‘“‘cross-index- 
ing” (p. 26)—a term not recognized by indexing specialists, but 
presumably referring to the provision of multiple access points. 

The bulk of the paper focuses on the development of a 


© 1994 John Wiley & Sons, Inc. 


JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 45(9):718-724, 1994 


computerized system for two art history collections. A lengthy 
appendix enumerates the fields in the database. I found it in- 
teresting that the discussion of the relationships between art 
objects (pp. 32-33) uses much of the same terminology found 
in the revised thesaurus standard (NISO, 1994), for example, 
part-whole and polyhierarchy. Busch concludes by observing 
that the structured approach to information management is 
time-consuming and may be replaced by informal linking in 
full-text and multimedia databases. 

In “Analyzing Art Objects for an Image Database,” Lois 
Lunin raises the question of whether verbal description is nec- 
essary when an information system includes images. After pre- 
senting a history of fiber art, she returns to this question and 
concludes that a vocabulary can be developed to describe non- 
representational works of art. Like Busch, she enumerates the 
fields in a database about images; many of the fields are the 
same as in his paper. Also like Busch, she notes the time-con- 
suming nature of completing the records (p. 69). 

Lunin then considers the advantages of analog vs. digital 
images and predicts that we will not see linked hypertext and 
hypermedia in humanities knowledge bases in our lifetime. (I 
must check out her reference to tasting “holographic choco- 
late”’!) 

Section II, ‘Indexing of Hypermedia,” is introduced by 
Raya Fidel, who raises the question of whether hypermedia re- 
quires a new method of indexing. It is unclear why Nancy Mul- 
vany’s paper on online help systems was sandwiched in be- 
tween the two papers on indexing and hypertext, but I shall 
summarize those before describing Mulvany’s. 

Gary Marchionini, in ‘Designing Hypertexts: Start with an 
Index,” begins by explaining that the links in hypertext consti- 
tute a type of indexing. After surveying hypertext software, 
published literature, and applications, he proceeds to a struc- 
tural analysis of links, comparing them to see also references in 
indexes and considering the problem of managing numerous 
links in large systems. Marchionini’s advice to hypertext au- 
thors is to compile the index to their work before they start 
writing—after they identify the primary facets of the topic and 
develop a controlled vocabulary. 

In “Hypertext and Indexing,” Peter Liebscher echoes some 
of Marchionini’s themes, suggesting that indexers should con- 
struct hypertext. He contrasts print documents with hypertext 
by stating that the former have fewer explicit links in the form 
of cross-references between sections or references to notes. The 
sequential structure of a printed text has implicit links. 
Liebscher views both indexes and tables of contents as docu- 
ment surrogates. He suggests that an index to a book can be 
developed as the outline is being filled in, not necessarily after 
the text is complete. 

While Marchionini labeled only see also references links, 
Liebscher also considers heading-locator combinations in in- 
dexes links, with the relation “represents,” and he argues that 
the same relation inheres in hypertext links, which he views as 
embedded index terms. Relationship types are coded through 
typed links, which are analogous to the thesaurus codes BT, 
NT, and RT. Liebscher concludes that the basic difference 


CCC 0002-8231/94/0907 18-07 


between hypertext and printed documents lies in their physical 
structure; an index rearranges the conceptual structure of a text 
just as hypertext does. 

Nancy Mulvany introduces her paper, “Online Help Sys- 
tems: A Multimedia Indexing Opportunity,” by noting that she 
creates indexes to single reference works, not to multiple docu- 
ments. She considers online help systems “woefully inade- 
quate,” owing to “‘a lack of index structures, and awkward soft- 
ware tools” (p. 91). She cites a study that users prefer printed 
documentation to online help, but notes that it is easier to keep 
online documentation current. 

Mulvany observes that the structure of a printed index is 
transparent to users; they find the information they seek 
quickly and do not think about how they were led to it. [This is 
why indexers are underpaid; their work seems so easy.] Mul- 
vany provides a detailed analysis of the online help in Micro- 
soft’s Word for Windows, showing how tedious it is to use as 
compared with a printed index. She suggests that the traditional 
book index structure should be incorporated in online help be- 
cause synthesized headings and subheadings are easier for users 
to work with than boolean search strategies. She also notes the 
desirability of indexing online tutorials presented in the form 
of digitized video, as well as the need to develop software tools 
that will integrate access to a tutorial and the corresponding full 
document seamlessly. 

The section ends with Dagobert Soergel’s 45-page paper, 
“Information Structure Management: A Unified Framework 
for Indexing and Searching in Database, Expert, Information- 
Retrieval, and Hypermedia Systems.” While I have no dis- 
agreement with Soergel’s analysis, I found this multidimen- 
sional comparison of index structures less useful than the com- 
parisons of only two structures in the preceding chapters. I fail 
to see any brilliant theoretical insights in the statement that a 
search term bears the relationship (link) dea/t-with-in to media 
objects, or in calling a set of documents a neighborhood (p. 
136). Applying one of Soergel’s constructs to this review, we 
could say that the link type is commented-by, coded specifically 
as “Connection 2 [Media object A criticized-by Media object 
B].”’ My entire review does not deal with his paper, thus it is in 
a “document offspring neighborhood” (p. 139). (If Soergel is 
uncomfortable in this neighborhood, he might consider mov- 
ing!) 

Philip Smith introduces section III, ““Computer Support 
Tools for Indexers,” by noting that the contributors believe that 
human indexing will not soon be replaced by automated pro- 
cedures, but that the computer can be used to enhance index- 
ers’ performance. Technically, four out of the five papers in this 
section are outside the scope of the book as they do not deal 
with electronic text, but rather with electronic tools for indexers 
analyzing printed documents. 

Susanne Humphrey begins her paper, “Knowledge-Based 
Systems for Indexing,” with a review of the literature demon- 
strating that computers cannot do concept indexing. She then 
describes the rules for indexing with Medical Subject Headings 
at the National Library of Medicine (NLM), noting the guid- 
ance provided by the printed thesaurus and the additional as- 
sistance of NLM’s Automated Indexing and Management Sys- 
tem, which paved the way for a knowledge-based system. Hum- 
phrey has published papers about MedIndEx in a variety of 
journals and conference proceedings, but this version is partic- 
ularly good, with many useful references and well-positioned 
figures. 

In “Computing Support for Indexing at Petroleum Ab- 
stracts: Design and Benefits,” John Bailey first provides a chro- 
nology of automated support for the thesaurus and then de- 
scribes the current system’s operation, referring to many figures 
that have insufficient annotations. For example, Figure 5, Hi- 
erarchy Display, shows the terms “Programing [sic], Computer 


Programing, Artificial Intelligence, Dynamic Programing, 
Software”—all aligned and with no coding to indicate hierar- 
chical levels. For those familiar with Hodge’s (1992) survey of 
automated support to indexing, the system described in Bai- 
ley’s paper will not be viewed as innovative. 

Ronald Buchan’s “Computerized Development and Use of 
the NASA Thesaurus” includes a detailed history of that con- 
trolled vocabulary, tracing its roots to the year 1916, and re- 
porting that the first edition was based on the Thesaurus of 
Engineering and Scientific Terms, issued in 1967. There are 
inconsistencies in referring to the latest edition of the NASA 
Thesaurus: page 187 says it is the fifth, with the report number 
(SP-7064), while page 190 associates that same number with a 
sixth edition (1988). The bibliography omits edition statement, 
but one may infer from the chronological sequence (pp. 197-— 
198) that the fifth edition appeared in 1988. 

Buchan notes that the full hierarchy of the NASA Thesaurus 
is not available online (p. 191). Many printed thesauri have 
better displays than their online counterparts (Weinberg & 
Cunningham, 1988). One may relate this point to Mulvany’s 
observation (see above) that the indexes to print manuals are 
superior to online help. 

June Silvester and Michael Genuardi also describe a NASA 
project in their paper, ‘““Machine-Aided Indexing from the 
Analysis of Natural-Language Text.’ The computer suggests 
index terms after “reading” a title and abstract. The knowledge 
base translates the “semantic units” extracted from the text 
into terms from the controlled vocabulary. Like Bailey and Bu- 
chan, the authors of this chapter provide a history of the MAI 
project, whose first goal was to translate terms from the Defense 
Technical Information Center into NASA terms. The detailed 
description of the processing of the text to isolate semantic 
units will be useful to managers of secondary services who are 
contemplating implementing machine-aided indexing. 

Silvester and Genuardi also consider the effects on the 
knowledge base of changes to the thesaurus, as well as how the 
knowledge base can serve as an aid to thesaurus construction. 
They conclude by suggesting that in a full-text environment the 
system could serve for totally automatic indexing. The Appen- 
dix of stopwords is preceded by an introduction that describes 
the generations of growth and contraction of the list. Appendix 
B contains an outline of the logic for machine-aided indexing 
at NASA. 

The authors note that the present knowledge base does not 
allow for inheritance from thesaurus hierarchies (p. 203). The 
system described could probably be categorized more accu- 
ately as one that employs rudimentary natural language pro- 
cessing. Silvester and Genuardi admit to the limitation of the 
system “that the semantic unit is restricted to the level of the 
phrase” (p. 212). Although they may have inappropriately used 
the vogue word knowledge base (see Weinberg [1990] for fur- 
ther discussion of this phenomenon), their paper is valuable for 
its procedural details. 

The section concludes with “Computerized Tools to Sup- 
port Document Analysts,” by Philip Smith, Lorraine Nor- 
more, Rebecca Denning, and Wayne Johnson. The paper 
begins with descriptions of content analysis (abstracting and 
indexing) at Chemical Abstracts Service (CAS) and its quality 
control reviews. The most common problem with abstracts is 
correct classification, that is, ‘choosing appropriate section and 
subsection assignments and . . . making needed cross-refer- 
ences” (p. 223). The table reporting indexing problems features 
infelicitous wording in that all of the categories under the head- 
ing ‘Problem Area” are stated positively, for example, ‘Used 
correct substance” (p. 225). 

The bulk of the paper contains an analysis of the tasks per- 
formed by CAS editors, with quotations from the “thinking 
aloud” protocol analyses conducted by the authors. The 


JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—October 1994 719 


authors then categorize the problems, such as “Typographic 
Errors,” under the rubric “Cognitive Processes” (p. 230). I find 
this vogue word unnecessary; the identical problems in human 
indexing have been reported for decades without this label. 

The authors then suggest computerized tools that would en- 
hance the quality of indexing. Many of these are well estab- 
lished in other systems, for example, providing online access to 
“samples of controlled vocabulary terms and associated. . . 
text modifications” (p. 231). Hodge (1992) has shown that 
prior indexing records are valuable models. Smith et al. also 
suggest conventional approaches to improving quality, such as 
group discussions for document analysts and editors. Other 
conventional suggestions are that analysts acquire subject 
knowledge by reading review articles in specialized areas of 
chemistry and that they experiment with searching to see the 
effects of indexing on precision and recall. 

In suggesting that allowance be made for analysts’ personal 
styles, the authors note that indexers disagree “regarding the 
worth of suggested improvements. One example was the idea 
that permutations of keywords be shown as the actual words 
rather than number strings” (p. 232). I fail to understand how 
strings of numbers can be helpful in indexing. 

The authors conclude by looking ahead to the possibilities 
of incorporating natural language processing, statistical tech- 
niques, and frame-based approaches. Not missing a buzzword, 
they express their goal “‘to search for and present information 
intelligently so as to provide immediate strategic information 
concerning choice of indexable items to analysts . . .” (pp. 
236-237, italics mine). 

Since Humphrey has actually developed a frame-based sys- 
tem, it would have been logical for her paper to be the final one 
in the section and for Smith et al.’s to have been first. Then the 
sequence would have represented a true hierarchy of computer- 
assisted indexing: (1) analysis of human indexing and sugges- 
tion of computer aids (Smith et al.); (2) availability of a ma- 
chine-readable thesaurus (Bailey); (3) use of computers to assist 
in thesaurus maintenance and reindexing (Buchan); (4) use of 
natural language processing to suggest candidate terms to in- 
dexers (Silvester and Genuardi); and (5) use of knowledge bases 
to assist human indexers (Humphrey). 

The final section, “Indexing and Retrieval from Full-Text,” 
does indeed logically follow the first three. It is edited by Edie 
Rasmussen, whose introduction approaches a state-of-the-art 
Teview (more on this below). 

Donna Harman sets two goals for her paper, “Automatic 
Indexing”: to provide a tutorial on single-term indexing, and to 
survey advances in automatic indexing beyond the extraction 
of single terms. She begins by discussing the definition of record 
in a database, citing her own research on the consequences of 
decisions regarding record size (e.g., paragraph, section), She 
then proceeds to define word, noting the problems introduced 
by hyphens and punctuation marks. Harman also considers the 
indexability of numbers, which can greatly increase the term 
count. She advises studying the corpus to be indexed before 
making decisions on the way special characters and one-letter 
words should be handled. This section is germane to the draft 
revision of the American standard for indexes (NISO, 1993), 
which in its glossary entry for “non-displayed index” suggests 
that all inverted files are identical: Harman makes clear that the 
algorithms for building these indexes may differ substantially. 

In discussing stop lists, Harman uses the expression ‘““‘flufP 
words” (p. 251) for words such as “below” and “near,” which 
do not have high frequency and may occasionally be useful in 
retrieval. As in the case of special characters, she suggests de- 
veloping a customized stop list for a particular application by 
examining a word frequency listing. She lists the 23 stop words 
that she selected for the Wall Street Journal, noting that com- 
mercial systems have even fewer. 


Harman then discusses stemming, comparing three of the 
most widely used algorithms and reiterating her theme that 
whether to use this feature depends on the size of the database 
and the nature of its terminology. She cites research findings 
that only weak stemming (e.g., removal of plurals) improves 
retrieval, and that there is little use of truncation [the obverse 
of stemming] in online searching (p. 254). 

In discussing advanced techniques, Harman indicates 
which work well only in small test collections and which scale 
up. She notes the poor performance of automatic document 
ranking algorithms and provides reasons for this along with 
suggestions as to how the algorithms can be improved. She then 
explains query expansion techniques, that is, adding terms to a 
search statement to enhance retrieval, possibly from a thesau- 
rus that is constructed automatically. 

Phrase matching is then examined; syntactic as well as sta- 
tistical methods are surveyed. Harman concludes that the use 
of phrases is important only in large full-text databases. In con- 
sidering a combination of techniques, she admits that we do 
not have a methodology for developing the ideal mix. Her 
honest assessment of the performance of automatic indexing 
algorithms, combined with her clear exposition of them, is es- 
pecially praiseworthy. 

“The Role of Linguistic Analysis in Full-Text Retrieval,” by 
Amy Warner, begins by noting that retrieval from full text is 
often poor because the documents are neither abstracted nor 
assigned index terms. Warner reports that there is a small 
amount of linguistic research in the field of information re- 
trieval and cites the literature discussing the relationship be- 
tween the two disciplines, including opinions that the “results 
[of linguistic approaches] do not justify the added complexity” 
(p. 266). 

Warner calls the search capabilities of commercial systems 
“linguistically crude” (ibid.), as they ignore both grammatical 
structure and meaning. She surveys linguistic theory, which is 
developed without concern for practical applications, as well 
as natural language processing, which may have practical or 
theoretical goals. Warner provides a useful table analyzing the 
levels of linguistic processing in various systems (p. 270) and 
echoes some of Harman’s points, for example, regarding the 
value of stemming. She illustrates the difference between statis- 
tically related terms and semantically related ones. Warner 
concludes that “‘it is not surprising that applications of linguis- 
tics to information retrieval systems have yielded little im- 
provement in system performance” because “[w]e know com- 
paratively little about the regular linguistic patterns of docu- 
ments and queries” (p. 274). 

The final paper, “Text Based Applications on the Connec- 
tion Machine,” by Brij Masand, Stephen Smith, and David 
Waltz, begins by noting that as machine-readable text prolifer- 
ates, human processing ‘‘becomes the bottleneck and/or domi- 
nant cost” (p. 277). The authors’ use of the term “‘keyworders” 
(p. 278) for indexers makes it clear that they do not have a 
background in library-information science; the fact that the 
majority of their bibliographic references are to computer sci- 
ence journals confirms this. 

Masand et al. illustrate the Connection Machine Document 
Retrieval System (CMDRS) with the query “Iran contra arms 
deal,” which is broken up into four “query words” that are 
searched by parallel processors. Relevance feedback is used to 
modify the initial set retrieved. 

To compress the large amounts of text, a high number of 
stop words is used—368. The authors of the preceding chapters 
have pointed out the disadvantages of this. Inverse document 
frequency is used for weighting. The authors refer to another 
system that uses “an inverted index. . . , allowing a more 
flexible retrieval than was. . . possible with. . .CMDRS” (p. 
279). This is an interesting admission, because the hallmark of 


720 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—October 1994 


parallel processing is the lack of inverted indexes. Described 
next is Wide Area Information Server (WAIS), which accepts 
natural language queries and translates them into the com- 
mand languages of various systems. The authors note that 
while the “software is available at no cost, it comes with no 
support” (p. 281). 

Automatic classification is described next, with the goal of 
reducing “the human costs involved in producing. . . systems 
for information retrieval” (ibid.). The authors consider “[a] pri- 
ori classification . . . the low technology status quo” (p. 284), 
but admit that it is difficult to search on broad categories with 
boolean queries or relevance feedback. They then consider five 
aspects of automatic classification: consistency, size, speed, 
change, and cost. They claim that their system “can perform 
with the same consistency as human editors and yet provide a 
total cost savings” (p. 285). 

The automatic classification system they developed for the 
Census Bureau uses memory-based reasoning, with a training 
database of previously classified census returns. This is con- 
trasted with an earlier expert system, which was less successful. 
Memory-based reasoning is a variation of “‘nearest neighbor 
classification” (p. 287). Masand et al. describe the complexity 
of finding an exact match to free-text strings in which people 
describe their business and note the high storage costs as the 
database increases in size. They illustrate the failure of simplis- 
tic matching on the basis of words, without consideration of 
their significance. 

Evaluation measures for the system are accuracy and cover- 
age. The first refers to correct classification, and the second rep- 
resents the percentage of cases that are not referred to human 
coders. The authors admit that “‘both cannot be simultaneously 
maximized” (p. 289), which brings to mind the inverse rela- 
tionship between precision and recall. The compromise se- 
lected for the classification scheme was to increase accuracy 
and reduce coverage. A second application of the system was 
classifying news stories into broad categories; precision was 
particularly low for subjects (Table 1). 

The authors conclude by suggesting other applications of 
automatic classification, such as medical diagnosis, but admit 
to the problems that dramatic increases in database size intro- 
duce. At the same time they maintain that hand-crafted frame 
structures will be inadequate for massive databases. This is 
truly a fitting closing chapter to the book, as it leaves the reader 
pondering the question of whether human indexing will be re- 
placed by supercomputers. 

It is common for reviewers of collections to state, “The qual- 
ity of the papers is uneven.” There is wide dispersion of the 
papers on the spectrum of practical- to research-orientation, 
but they all have value. For this work, one may observe that the 
quality (i.e., the nature) of the editors’ introductions is uneven. 
The preface to the book is unsigned and is essentially a descrip- 
tion of the papers combined with the expression of the hope 
that they will stimulate further research. Fidel’s and Smith’s 
brief introductions provide abstracts of the papers in their re- 
spective sections. (There is no abstract at the head of each pa- 
per.) Hahn’s longer introduction is the only one that gives the 
dates of the original presentations at ASIS annual meetings and 
notes the extent of updating. In addition to describing the cre- 
dentials of the contributors to her section, she provides a sub- 
stantive discussion of the problems of analyzing visual images. 
She refers to published papers, but does not give formal refer- 
ences. Rasmussen’s five-page introduction is an excellent re- 
view of the literature on full-text indexing and retrieval, with 
18 bibliographic references. 

The diffuse nature of the editorial responsibility—in addi- 
tion to the four section editors named on the title page, Learned 
Information’s book editor is named on the verso—demon- 
strates the wisdom of the cataloging rule that ifa work has more 


than three authors or editors, it should get title main entry, with 
an added entry for only the first-named editor (AACR2, 1988, 
rule 21.6C2). 

It seems that the title and order of editors named on the title 
page was changed shortly before the book appeared. A “biblio- 
graphic ghost” was thus created by Mulvany, one of the con- 
tributors, who cited several papers from this collection in her 
monograph /ndexing Books, which appeared in April 1994, ap- 
proximately two months after the publication of this book. In 
several references, Mulvany cites Indexing Electronic Images 
and Text, edited by T. Bellardo et al., with a publication date of 
1993 and no indication that the work was in press. Subse- 
quently, two words were added to the beginning of the title, the 
order of “Text” and “Images” in the title was changed, Bellardo 
changed her surname to Hahn, Fidel became first-named editor 
(apparently because the editors are listed alphabetically), and 
the book did not appear until 1994! 

It is unfortunate that Cataloging-in-Publication (CIP) Data 
was not included in this collection. Publishers are often un- 
aware of the value of CIP, but the library science professors who 
edited the work should be. Collections of conference papers of- 
ten lie uncataloged in the backrooms of libraries; the publica- 
tion of classification data on the verso of a title page allows such 
works to be processed immediately. CIP records are included 
in the MARC (machine-readable cataloging) database; MARC 
records constitute a valuable form of advertising, as such re- 
cords are mounted immediately on the major bibliographic uti- 
lities. OCLC was searched in April 1994, two months after the 
book was received, and a Library of Congress record for this 
book was not found; had CIP been requested, the record would 
have been available prior to publication. 

The editors seem to have been unable to get the contributors 
to use a consistent bibliographic format: some use the numeric 
system of citation; others use the author—date system. While it 
is admittedly difficult for an editor to convert one to the other, 
many variations in sequence of elements and punctuation 
could have been unified without too much effort. Some of the 
numeric references contain textual notes, but I question the 
wisdom of applying the heading ‘‘NOTES” to the alphabeti- 
cally arranged lists of bibhographic references, simply to have a 
consistent heading at the end of all the chapters. 

The extent of documentation varies greatly throughout the 
volume. Some chapters have only a few notes; others are fol- 
lowed by notes as well as a bibliography. In the information 
science literature as a whole, descriptions of a single system of- 
ten have few references, while research papers have many. Bai- 
ley’s description of indexing at Petroleum Abstracts fits the for- 
mer pattern, with a single reference to an unpublished docu- 
ment. Buchan’s history of the NASA Thesaurus, in contrast, is 
followed by a six-page bibliography which seems to list every 
report that ever mentioned the thesaurus. An editor might have 
questioned the need for Buchan to provide 23 self-citations, 
including many unpublished papers. Dates are missing in four 
of Buchan’s bibliographic entries, in three of Silvester and Gen- 
uardi’s as well as three out of Liebscher’s five references, and in 
one of Mulvany’s and Smith’s; apparently no editor asked the 
contributors to supply this important data element. 

In terms of English grammar and spelling, the text is well 
edited and well proofread. There is a negligible number of ty- 
pographical errors. (If you really want to nitpick, there are su- 
perfluous periods in several occurrences of “et. al.”’ on p. 259!) 
The editor should have unified the orthography of such terms 
as online, which is spelled “on-line” in some chapters. There 
are a few errors in technical terms that the authors did not pick 
up in their proofreading and which the book editor could not 
be expected to correct. Rasmussen, for example, mentions ‘“a 
system’s capability of retrieving only relative documents” (p. 
241); the italicized term should be relevant. Some of the 


JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—October 1994 721 


contributors also did not do a careful job of proofreading their 
bibliographic references. Genuardi’s name is misspelled in one 
of the references to his own chapter, and—most unforgivable 
of all— Buchan misspelled my name in his bibliography! 

A book designer is credited on the verso of the title page, as 
is the designer of the cover. The volume is very attractive, with 
good quality paper, clearly reproduced photographs, and beau- 
tifully framed section title pages. There is a generous gutter 
margin, which makes it easier to read a book and to get a clear 
photocopy of a page. A generous gutter margin also facilitates 
rebinding, although the hardback cover is of such high quality 
that the volume will probably never need rebinding. 

A reviewer of another Learned Information publication 
noted that the size of type used for notes was too small and that 
the reduction ratio of some figures impeded legibility (Tillett, 
1993). The same comments could be applied to this work. The 
Appendix to Busch’s paper, a table with three columns, is in 
““near-print.”” The table could have been set sideways, or a 
larger format for the book would have eliminated the problem. 
Bad breaks were not corrected. For example, page 180 begins 
with the partial word “tor,” and page 181 with “acter”—resi- 
dues of the hyphenated words “descriptor” and “character,” 
respectively. 

The book’s design has several flaws from an information 
perspective. The individual papers are numbered sequentially 
as chapters throughout the book. The sequence of papers is log- 
ical on the whole, but they are largely self-contained docu- 
ments, and one may therefore question the chapter designation 
and numbering. These devices allowed for cross-references be- 
tween papers, but the chapter number was not included in the 
running heads, which display authors’ names on left-hand 
pages and brief titles on right-hand pages. To follow up the 
chapter cross-references, one must thus return to the table of 
contents. Running heads are not found in the Introductions to 
the four sections, nor does the title of each section appear above 
the word “Introduction.” 

A more serious omission is a running foot for the title of the 
work. The basic bibliographic data required to cite any of the 
papers is not given even at the head of each chapter. As elec- 
tronic journals proliferate, it is increasingly being recognized 
that the individual article is the significant unit of information; 
the same goes for collections of conference papers. If these are 
analyzed in catalogs and indexes, copies of single papers are 
likely to be requested through interlibrary loan or document 
delivery services. Not only is it therefore desirable to have full 
bibliographic data for the source on the title page of each piece, 
but also notes on prior publication and/or presentation— 
which in this volume are either contained in an editor’s intro- 
duction or lacking entirely. 

No editor intervened to change such phrases as “The past 
year,” for example, in Mulvany’s paper (p. 91). It is unlikely 
that the paper was presented in 1993, as one might infer from 
the date of publication, and the currency of the information is 
thus unclear. Humphrey uses the phrase “has just been com- 
pleted” (p. 172) in referring to the evaluation project for Med- 
IndEx. Since her paper includes a 1993 reference, it seems that 
hers was updated more recently than the others in the collec- 
tion. Silvester and Genuardi state in the preface to their appen- 
dix of stopwords: ‘“‘Only one change has been made since this 
list was instituted” (p. 216). When was that? Given the fast- 
changing nature of this field, dates were essential in all of these 
cases. 

A Directory of Contributors precedes the Preface. (Neither 
section is listed in the Table of Contents.) The byline on each 
chapter gives name alone. It would have been preferable to in- 
clude author affiliation with each paper. As noted above, bio- 
graphical data on some of the contributors is contained in the 


section-editors’ introductions, resulting in an unfortunate scat- 
ter of the information pertaining to a single paper. 

The eight-page index, whose compiler is not named, has a 
professional look, with indented subheadings and no lengthy, 
unanalyzed strings of page references. There are minor flaws 
in the syndetic (cross-reference) structure. Several of the see 
references could have been converted to double entries without 
adding lines to the index. A case in point is “‘pre-/post-coordi- 
nation, See coordinate indexing’’; the latter heading has the 
simple locator “163-164.” In contrast to this, the subheadings 
of “National Library of Medicine” are duplicated under 
“NLM.” 

There is an error in the see reference from “machine-aided 
indexing” to “computer-aided indexing,” as the heading is ac- 
tually “‘computer-assisted indexing.” The latter is filed after 
“computer documentation,” presumably by software that does 
not treat a hyphen as a space. The same filing rule leads one to 
believe that “library use, See information-seeking behavior” is 
a blind reference because the latter term is not found in its ex- 
pected position between “information needs” and “Informa- 
tion Storage. . .” but rather in the next column, after “‘infor- 
mation types.” There is no introductory note explaining this 
filing rule. 

The indexer uses the command See, rather than See under, 
to refer from a heading to a subheading, for example, “‘consis- 
tency, See indexing quality,” when “consistency” is a subhead- 
ing, not a synonym, of “indexing quality.” There are authori- 
ties on indexing (Wellisch, 1991, p. 82) who consider this ac- 
ceptable, but J know of none for the unusual pattern of See 
also references to multiple terms that is found in this index: 
“descriptors, See also controlled vocabulary or thesauri.”’ This 
reference may be misinterpreted to mean that the reader can 
get all the information s/he seeks by looking under either head- 
ing. The more common pattern is to place semicolons between 
the terms. 

As is general practice, a colon separates heading and sub- 
heading in See also references to such combinations; this punc- 
tuation is lacking, however, in the reference “queries, See also 
searching interface,” as the latter is not a compound heading in 
the index; “interface”’ is a subheading of “‘searching.” 

When See also references emanate from a main heading 
with locators, the indexer generally positioned the locators be- 
fore the references; in the case of “indexing,” however, the lo- 
cator 281 follows the See also references. (The locator may in 
fact be an error, as the text discusses WAIS on that page.) 

Such minor errors in format do not significantly affect use 
of the index. The most serious formatting flaw is an error of 
omission by the typesetter, who apparently uploaded the index- 
er’s disk and failed to add continuation headings to pages that 
begin with subheadings; half the columns in the index are 
marred in this way. For example, the first line on page 302 is 
“online reference manuals’; this is actually a subheading of 
“indexing aids.” If the indexer did not have the opportunity to 
check the format of the index, the section editors should have; 
it is predictable that the index to a book on indexing will be 
examined by a reviewer. 

If no review was done of the externa! format of the index, it 
seems that even less attention was paid to its completeness. A 
category of entries completely absent from the index is names 
of authors whose work is discussed or cited in the text. There is 
no merged alphabetic list of references at the end of the book, 
which in any case would only allow one to determine that a 
given work was cited, not where or by whom. One could scan 
the references at the end of each chapter; those that are ar- 
ranged alphabetically facilitate location of a given reference, 
and those that are arranged numerically allow the reader to lo- 
cate the discussion in the text. 

Even going through the reference lists in the 15 chapters 


722 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—October 1994 


would not do the whole job. As noted above, Hahn’s introduc- 
tion discusses works on indexing images without formal refer- 
ences. It would thus have been desirable, for example, to find 
an entry for “Shatford, Sara” in the index and be led to Hahn’s 
discussion of Shatford’s work (p. 4) as well as Keister’s refer- 
ence to the same paper (p. 13). 

Because citation analysis forms such a large component of 
the information science literature, ASIS editors should be 
aware of the importance of citation indexing and the many 
functions it serves, such as the study of interdisciplinarity. One 
of the first things I wondered when I received this book was 
whether Shatford’s work would be cited because her paper ap- 
peared in a cataloging journal with small circulation. Informa- 
tion scientists may think that the indexing of cited references is 
anew idea, emanating from the publication of Science Citation 
Index several decades ago, but a manual of indexing published 
at the turn of the century includes the following rule: “‘Author- 
ities quoted or referred to in a book, to be indexed under each 
author’s name. . .” (Wheatley, 1902, p. 159). 

An index user can quickly determine that name entries are 
lacking in an index. It is misleading, however, when an index 
includes subject entries but fails to identify all occurrences of a 
topic. There are numerous cases of this problem in the work 
under review. 

To take a simple proper name entry, “Art and Architecture 
Thesaurus (AAT)” has only a single locator: 67. Page 17 of the 
book, however, notes that the AAT is one of four thesauri used 
by the National Library of Medicine for its picture retrieval 
system. Whereas a list of four titles may be considered marginal 
in another context, surely any mention of the AAT is signifi- 
cant in a book on indexing images. Rasmussen’s introduction 
and Harman’s paper summarize the key findings of the Cran- 
field Tests. This corporate heading is not indexed, although 
analogous ones, such as “TIPSTER,” are. 

To take an example of inconsistent indexing of a topical 
term: Marchionini discusses use of a controlled vocabulary in 
hypertext (p. 83ff), and Mulvany mentions controlled vocabu- 
lary in the context of online help systems (p. 98). These pas- 
sages are indexed neither under “controlled vocabulary” nor 
under “thesauri.” One of the most interesting themes that re- 
curs throughout the collection is that controlled vocabularies 
are necessary in electronic environments. The inability to re- 
trieve all of these passages via the index is a serious problem. 

One more example: The number of stop words used in the 
Connection Machine system is discussed on page 279. Al- 
though the index includes the entry “‘stopwords,” this locator 
is lacking. Having encountered the index heading “controlled 
vocabulary” or “stopwords,” most users will assume complete 
recall of the passages on these topics. Intner (1984) has pointed 
out that such incomplete indexes constitute a form of censor- 
ship. 

I have taken the time-—and the space—to comment on what 
some may consider minor aspects of a book because I believe 
that ASIS publications should be models of editorial quality 
and compliance with information standards. 

At $39.50 the book is a bargain, especially as compared with 
other unedited, camera-ready collections of conference papers 
on related themes. (For a critical review of such a collection 
priced at $128.50, see Weinberg [1993a].) The very reasonable 
price makes ownership of this book accessible to individuals as 
well as to libraries. 

The work belongs in the collections of all schools of library 
and information science. Most of the chapters are not suitable 
readings for introductory courses on indexing, but many can 
serve as auxiliary readings for advanced courses on the organi- 
zation of information. Practitioners involved in the develop- 
ment of full-text, image, or hypertext databases will find useful 
ideas in this collection. 


It is hoped that the volume will reach those who are at- 
tempting to provide access to these new media. A recent article 
by a photo researcher (Russell, 1994) called for the develop- 
ment of standards for “keywording,” that is, indexing of im- 
ages; the author seemed to be oblivious to the vast corpus of 
relevant publications. 

ASIS and the volume’s editors are to be congratulated for 
producing an important contribution to the indexing literature. 
It has been noted that many professional associations claim to 
be the primary source of know-how on the organization of in- 
formation (Weinberg, 1993b). This collection makes ASIS a 
strong contender for that title. 


Bella Hass Weinberg 

Division of Library and Information Science 
St. John’s University 

Jamaica, NY 11439 


REFERENCES 


AACR2. (1988). Anglo-American cataloguing rules (2nd ed., 1988 
rev.). Chicago: American Library Association. 

Hodge, G. M. (1992). Automated support to indexing. Philadelphia, 
PA: National Federation of Abstracting and Information Services. 
(1992 NFAIS Report Series.) 

Intner, S. S. (1984). Censorship in indexing. The Indexer, 14, 105-108. 

NISO. (1993). National Information Standards Organization. Pro- 
posed American national standard guidelines for indexes and related 
information retrieval devices. Oxon Hill, MD: NISO Press. (z39.4.) 

NISO. (1994). National Information Standards Organization. Guide- 
lines for the construction, format, and management of monolingual 
thesauri. Oxon Hill, MD: NISO Press, forthcoming. (z39.19-1993.) 

Mulvany, N. C. (1994). Indexing books. Chicago: University of Chi- 
cago Press. 

Russell, A. (1994). Keywording: The key to photo marketing. Key 
Words: The Newsletter of the American Society of Indexers, 2(1), |, 
21. 

Tillett, B. B. (1993). Review of Cataloging heresy: Challenging the stan- 
dard bibliographic product. College & Research Libraries, 54, 77-78. 

Weinberg, B. H. (1990). Vogue words in information science. Bulletin 
of the American Society for Information Science, 16/4), 15. 

Weinberg, B. H. (1993a). Review of Classification Research for Knowl- 
edge Representation and Organization: Proceedings of the Sth In- 
ternational Study Conference on Classification Research. Informa- 
tion Technology and Libraries, 12, 282. 

Weinberg, B. H. (1993b). The American Society of Indexers: History, 
current activities, and relationship to ASIS. Bulletin of the American 
Society for Information Science, 19(3), 23-24. 

Weinberg, B. H., & Cunningham, J. A. (1988). The design of online 
thesauri. National Online Meeting Proceedings (pp. 411-419). Med- 
ford, NJ: Learned Information, Inc. 

Wellisch, H. H. (1991). Indexing from A to Z. Bronx, NY: H. W. Wil- 
son. 

Wheatley, H. B. (1902). How to make an index. London: E, Stock. 


Strategic Management for Academic Libraries. Robert M. 
Hayes. Westport, CT: Greenwood Press; 1993: 218 pp. $55.00. 
(ISBN 0-313-281 11-4.) 


Why a book about strategic management in a time when 
most write about strategic planning? Robert Hayes has a repu- 
tation for leading the way on many professional issues and this 
book will only enhance that reputation. He believes that to- 


JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—October 1994 723 


day’s organizational environment changes so rapidly that if one 
only focuses on strategic planning the organization will quickly 
fall behind. In his view, what academic libraries need to do is 
engage in strategic management. He sees planning as playing a 
part but seldom the leading role in strategic management. In 
essence, strategic planning narrows management’s focus at a 
time when a wider point of view is necessary. 

This book grew out of a Council on Library Resources grant 
on “strategic planning for libraries and information resources 
in the research library” (p. ix). This is not simply a rewritten 
final report but a well-written comprehensive text covering 
concepts, contexts, and techniques. While the focus is on the 
research library environment, most of the material applies 
equally well to other types of libraries. 

A publication that is somewhat similar in character is a spe- 
cial issue of the Journal of Library Administration (Williams, 
1990) on “Strategic Planning in Higher Education: Implement- 
ing New Roles for the Academic Library.”’ Although that spe- 
cial issue is almost as long as this book, Dr. Hayes’ coverage is 
much broader and from a university library director's point of 
view, more useful. More useful in that it provides a unified and 
integrated approach to the real issues in strategic thinking. He 
is correct that very often planning cannot keep pace with actual 
developments, especially in the information technology area. 
Libraries often face the need to implement an activity now or 
be bypassed. Planning for implementation of technology three 
or four years in the future is more likely to lead to failure rather 
than success. His definition of strategic management outlines 
the major concepts in the books: “Strategic management is that 
part of the general management of organizations that empha- 
sizes the relationships of external environments, evaluates the 
current status and the effects of future changes in them, and 
determines the most appropriate organizational response” 
(p. 5). 

In the chapter on top management’s role and responsibility, 
he demonstrates the relationships of strategic management ac- 


tivities to a variety of contexts such as political and administra- 
tive. He provides in-depth discussions and linkages in later 
chapters about concerns (contexts) of internal management, of 
constituencies served, and the external environment. Through- 
out the text there is an emphasis on information technology 
and its impact on academic institutions and libraries. 

Part III, “Techniques,” is in this reviewer’s mind the most 
useful section of a thoroughly useful text. Certainly many of 
the techniques are ones that are often found in good compre- 
hensive management books. Even those that are at least famil- 
iar by name are seldom ones that most university library ad- 
ministrators use on a regular basis. His illustrating the applica- 
tions of the techniques in a library setting should provide some 
motivation, if not inspiration, to make use of them. While he 
does not go into depth of how to employ the techniques, his 
copious footnotes provide the leads to more in-depth presenta- 
tions. 

This would be a good addition to any library staff develop- 
ment collection and would also be useful as a supplemental text 
in a general library management course. 


G. Edward Evans 

University Librarian 

Loyola Marymount University 
Loyola Boulevard & West 80th Street 
Los Angeles, CA 90045-2699 


Reference 


Williams, J. F. (Guest Editor) (1990), Strategic planning in higher edu- 
cation: Implementing new roles for the academic library. Journal of 
Library Administration 13. 


724 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—October 1994 


